Overview

Dataset statistics

Number of variables11
Number of observations380
Missing cells3
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory32.8 KiB
Average record size in memory88.3 B

Variable types

Numeric8
Categorical3

Alerts

City has a high cardinality: 366 distinct values High cardinality
Male Population is highly correlated with Female Population and 3 other fieldsHigh correlation
Female Population is highly correlated with Male Population and 3 other fieldsHigh correlation
Total Population is highly correlated with Male Population and 3 other fieldsHigh correlation
Number of Veterans is highly correlated with Male Population and 2 other fieldsHigh correlation
Foreign-born is highly correlated with Male Population and 2 other fieldsHigh correlation
Male Population is highly correlated with Female Population and 3 other fieldsHigh correlation
Female Population is highly correlated with Male Population and 3 other fieldsHigh correlation
Total Population is highly correlated with Male Population and 3 other fieldsHigh correlation
Number of Veterans is highly correlated with Male Population and 3 other fieldsHigh correlation
Foreign-born is highly correlated with Male Population and 3 other fieldsHigh correlation
Male Population is highly correlated with Female Population and 2 other fieldsHigh correlation
Female Population is highly correlated with Male Population and 2 other fieldsHigh correlation
Total Population is highly correlated with Male Population and 2 other fieldsHigh correlation
Number of Veterans is highly correlated with Male Population and 2 other fieldsHigh correlation
State Code is highly correlated with StateHigh correlation
State is highly correlated with State CodeHigh correlation
State is highly correlated with State CodeHigh correlation
Male Population is highly correlated with Female Population and 3 other fieldsHigh correlation
Female Population is highly correlated with Male Population and 3 other fieldsHigh correlation
Total Population is highly correlated with Male Population and 3 other fieldsHigh correlation
Number of Veterans is highly correlated with Male Population and 3 other fieldsHigh correlation
Foreign-born is highly correlated with Male Population and 3 other fieldsHigh correlation
State Code is highly correlated with StateHigh correlation
City is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
Total Population has unique values Unique

Reproduction

Analysis started2022-11-06 14:36:39.756121
Analysis finished2022-11-06 14:37:14.262884
Duration34.51 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct380
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean495.3552632
Minimum1
Maximum1963
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:14.617003image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile31.9
Q1162.75
median381.5
Q3742.5
95-th percentile1400.7
Maximum1963
Range1962
Interquartile range (IQR)579.75

Descriptive statistics

Standard deviation416.141451
Coefficient of variation (CV)0.8400868668
Kurtosis0.8414430476
Mean495.3552632
Median Absolute Deviation (MAD)257.5
Skewness1.120269433
Sum188235
Variance173173.7072
MonotonicityStrictly increasing
2022-11-06T15:37:14.975510image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.3%
6091
 
0.3%
6391
 
0.3%
6381
 
0.3%
6341
 
0.3%
6311
 
0.3%
6251
 
0.3%
6241
 
0.3%
6161
 
0.3%
6151
 
0.3%
Other values (370)370
97.4%
ValueCountFrequency (%)
11
0.3%
31
0.3%
41
0.3%
51
0.3%
61
0.3%
71
0.3%
81
0.3%
91
0.3%
101
0.3%
121
0.3%
ValueCountFrequency (%)
19631
0.3%
19311
0.3%
17741
0.3%
17621
0.3%
16571
0.3%
16511
0.3%
16451
0.3%
16091
0.3%
16041
0.3%
15811
0.3%

City
Categorical

HIGH CARDINALITY
UNIFORM

Distinct366
Distinct (%)96.3%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
Bloomington
 
3
Peoria
 
2
Glendale
 
2
Pasadena
 
2
Union City
 
2
Other values (361)
369 

Length

Max length20
Median length16
Mean length9.023684211
Min length2

Characters and Unicode

Total characters3429
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique353 ?
Unique (%)92.9%

Sample

1st rowPeoria
2nd rowHampton
3rd rowLakewood
4th rowMesa
5th rowBryan

Common Values

ValueCountFrequency (%)
Bloomington3
 
0.8%
Peoria2
 
0.5%
Glendale2
 
0.5%
Pasadena2
 
0.5%
Union City2
 
0.5%
Concord2
 
0.5%
Aurora2
 
0.5%
Jacksonville2
 
0.5%
Richmond2
 
0.5%
Rochester2
 
0.5%
Other values (356)359
94.5%

Length

2022-11-06T15:37:15.277066image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
san13
 
2.6%
beach10
 
2.0%
city8
 
1.6%
santa7
 
1.4%
valley6
 
1.2%
new4
 
0.8%
fort4
 
0.8%
saint4
 
0.8%
hills4
 
0.8%
el3
 
0.6%
Other values (401)441
87.5%

Most occurring characters

ValueCountFrequency (%)
a345
 
10.1%
e299
 
8.7%
n276
 
8.0%
o260
 
7.6%
l219
 
6.4%
i214
 
6.2%
r200
 
5.8%
t160
 
4.7%
s132
 
3.8%
124
 
3.6%
Other values (43)1200
35.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2792
81.4%
Uppercase Letter508
 
14.8%
Space Separator124
 
3.6%
Dash Punctuation3
 
0.1%
Other Punctuation2
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a345
12.4%
e299
10.7%
n276
9.9%
o260
9.3%
l219
 
7.8%
i214
 
7.7%
r200
 
7.2%
t160
 
5.7%
s132
 
4.7%
d93
 
3.3%
Other values (16)594
21.3%
Uppercase Letter
ValueCountFrequency (%)
C56
 
11.0%
S52
 
10.2%
B43
 
8.5%
P39
 
7.7%
A32
 
6.3%
R30
 
5.9%
M28
 
5.5%
L28
 
5.5%
F24
 
4.7%
G21
 
4.1%
Other values (14)155
30.5%
Space Separator
ValueCountFrequency (%)
124
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3
100.0%
Other Punctuation
ValueCountFrequency (%)
'2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3300
96.2%
Common129
 
3.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a345
 
10.5%
e299
 
9.1%
n276
 
8.4%
o260
 
7.9%
l219
 
6.6%
i214
 
6.5%
r200
 
6.1%
t160
 
4.8%
s132
 
4.0%
d93
 
2.8%
Other values (40)1102
33.4%
Common
ValueCountFrequency (%)
124
96.1%
-3
 
2.3%
'2
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3429
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a345
 
10.1%
e299
 
8.7%
n276
 
8.0%
o260
 
7.6%
l219
 
6.4%
i214
 
6.2%
r200
 
5.8%
t160
 
4.7%
s132
 
3.8%
124
 
3.6%
Other values (43)1200
35.0%

State
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
California
114 
Texas
48 
Florida
41 
Illinois
19 
Michigan
15 
Other values (12)
143 

Length

Max length14
Median length13
Mean length8.463157895
Min length4

Characters and Unicode

Total characters3216
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIllinois
2nd rowVirginia
3rd rowColorado
4th rowArizona
5th rowTexas

Common Values

ValueCountFrequency (%)
California114
30.0%
Texas48
12.6%
Florida41
 
10.8%
Illinois19
 
5.0%
Michigan15
 
3.9%
Colorado15
 
3.9%
Arizona14
 
3.7%
Washington14
 
3.7%
Virginia13
 
3.4%
Massachusetts13
 
3.4%
Other values (7)74
19.5%

Length

2022-11-06T15:37:15.581960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
california114
27.5%
texas48
11.6%
florida41
 
9.9%
new22
 
5.3%
illinois19
 
4.6%
michigan15
 
3.6%
colorado15
 
3.6%
arizona14
 
3.4%
washington14
 
3.4%
massachusetts13
 
3.1%
Other values (9)99
23.9%

Most occurring characters

ValueCountFrequency (%)
a488
15.2%
i446
13.9%
o302
9.4%
n265
 
8.2%
r253
 
7.9%
l230
 
7.2%
s154
 
4.8%
C141
 
4.4%
e115
 
3.6%
f114
 
3.5%
Other values (23)708
22.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2768
86.1%
Uppercase Letter414
 
12.9%
Space Separator34
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a488
17.6%
i446
16.1%
o302
10.9%
n265
9.6%
r253
9.1%
l230
8.3%
s154
 
5.6%
e115
 
4.2%
f114
 
4.1%
d76
 
2.7%
Other values (10)325
11.7%
Uppercase Letter
ValueCountFrequency (%)
C141
34.1%
M48
 
11.6%
T48
 
11.6%
F41
 
9.9%
N34
 
8.2%
I29
 
7.0%
W14
 
3.4%
A14
 
3.4%
V13
 
3.1%
Y11
 
2.7%
Other values (2)21
 
5.1%
Space Separator
ValueCountFrequency (%)
34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3182
98.9%
Common34
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a488
15.3%
i446
14.0%
o302
9.5%
n265
 
8.3%
r253
 
8.0%
l230
 
7.2%
s154
 
4.8%
C141
 
4.4%
e115
 
3.6%
f114
 
3.6%
Other values (22)674
21.2%
Common
ValueCountFrequency (%)
34
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3216
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a488
15.2%
i446
13.9%
o302
9.4%
n265
 
8.2%
r253
 
7.9%
l230
 
7.2%
s154
 
4.8%
C141
 
4.4%
e115
 
3.6%
f114
 
3.5%
Other values (23)708
22.0%

Median Age
Real number (ℝ≥0)

Distinct157
Distinct (%)41.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.81736842
Minimum22.9
Maximum70.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:15.902296image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum22.9
5-th percentile28.7
Q132.8
median35.4
Q338.5
95-th percentile43.755
Maximum70.5
Range47.6
Interquartile range (IQR)5.7

Descriptive statistics

Standard deviation4.911454859
Coefficient of variation (CV)0.1371249502
Kurtosis6.015550911
Mean35.81736842
Median Absolute Deviation (MAD)2.7
Skewness1.053303375
Sum13610.6
Variance24.12238883
MonotonicityNot monotonic
2022-11-06T15:37:16.586009image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
35.58
 
2.1%
38.17
 
1.8%
32.87
 
1.8%
35.77
 
1.8%
33.17
 
1.8%
33.46
 
1.6%
34.26
 
1.6%
34.55
 
1.3%
33.85
 
1.3%
35.35
 
1.3%
Other values (147)317
83.4%
ValueCountFrequency (%)
22.91
0.3%
23.51
0.3%
23.91
0.3%
24.21
0.3%
261
0.3%
26.22
0.5%
26.31
0.3%
26.41
0.3%
26.92
0.5%
27.41
0.3%
ValueCountFrequency (%)
70.51
0.3%
48.81
0.3%
47.91
0.3%
47.41
0.3%
47.31
0.3%
471
0.3%
46.91
0.3%
46.81
0.3%
45.91
0.3%
45.82
0.5%

Male Population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct379
Distinct (%)100.0%
Missing1
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean98011.33245
Minimum29995
Maximum4081698
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:16.978412image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum29995
5-th percentile32282.9
Q138663
median50405
Q379442
95-th percentile267806.4
Maximum4081698
Range4051703
Interquartile range (IQR)40779

Descriptive statistics

Standard deviation241682.256
Coefficient of variation (CV)2.465860324
Kurtosis198.6629887
Mean98011.33245
Median Absolute Deviation (MAD)14434
Skewness12.71059724
Sum37146295
Variance5.841031286 × 1010
MonotonicityNot monotonic
2022-11-06T15:37:17.270104image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
562291
 
0.3%
491861
 
0.3%
506801
 
0.3%
322111
 
0.3%
392211
 
0.3%
901921
 
0.3%
579611
 
0.3%
400151
 
0.3%
421001
 
0.3%
433141
 
0.3%
Other values (369)369
97.1%
ValueCountFrequency (%)
299951
0.3%
301931
0.3%
307581
0.3%
308441
0.3%
308901
0.3%
312051
0.3%
312481
0.3%
313691
0.3%
313821
0.3%
313951
0.3%
ValueCountFrequency (%)
40816981
0.3%
13200151
0.3%
11496861
0.3%
7868331
0.3%
7214051
0.3%
6938261
0.3%
6390191
0.3%
5183171
0.3%
4757181
0.3%
4397521
0.3%

Female Population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct378
Distinct (%)99.7%
Missing1
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean102475.8945
Minimum27348
Maximum4468707
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:17.622021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum27348
5-th percentile33947.4
Q140886
median51265
Q383314
95-th percentile272885.1
Maximum4468707
Range4441359
Interquartile range (IQR)42428

Descriptive statistics

Standard deviation260279.9006
Coefficient of variation (CV)2.53991343
Kurtosis212.6129316
Mean102475.8945
Median Absolute Deviation (MAD)14172
Skewness13.25191261
Sum38838364
Variance6.774562664 × 1010
MonotonicityNot monotonic
2022-11-06T15:37:17.973186image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
358012
 
0.5%
624321
 
0.3%
561281
 
0.3%
352341
 
0.3%
369111
 
0.3%
921751
 
0.3%
587841
 
0.3%
273481
 
0.3%
464071
 
0.3%
454561
 
0.3%
Other values (368)368
96.8%
ValueCountFrequency (%)
273481
0.3%
314561
0.3%
321731
0.3%
323971
0.3%
327631
0.3%
327991
0.3%
328071
0.3%
329011
0.3%
330941
0.3%
333101
0.3%
ValueCountFrequency (%)
44687071
0.3%
14005411
0.3%
11489421
0.3%
7761681
0.3%
7484191
0.3%
7010811
0.3%
6610631
0.3%
5086021
0.3%
4561221
0.3%
4488281
0.3%

Total Population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct380
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200150.6553
Minimum63651
Maximum8550405
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:18.267264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum63651
5-th percentile67358.4
Q179001.25
median103332
Q3161607.5
95-th percentile536182.75
Maximum8550405
Range8486754
Interquartile range (IQR)82606.25

Descriptive statistics

Standard deviation501249.5923
Coefficient of variation (CV)2.504361485
Kurtosis206.4515152
Mean200150.6553
Median Absolute Deviation (MAD)28802
Skewness13.00940053
Sum76057249
Variance2.512511537 × 1011
MonotonicityNot monotonic
2022-11-06T15:37:18.611251image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1186611
 
0.3%
1024671
 
0.3%
1008841
 
0.3%
674451
 
0.3%
761321
 
0.3%
1823671
 
0.3%
1167451
 
0.3%
673631
 
0.3%
885071
 
0.3%
887701
 
0.3%
Other values (370)370
97.4%
ValueCountFrequency (%)
636511
0.3%
637921
0.3%
646091
0.3%
648191
0.3%
648371
0.3%
649621
0.3%
650651
0.3%
652651
0.3%
652991
0.3%
655321
0.3%
ValueCountFrequency (%)
85504051
0.3%
27205561
0.3%
22986281
0.3%
15630011
0.3%
14698241
0.3%
13949071
0.3%
13000821
0.3%
10269191
0.3%
9318401
0.3%
8680311
0.3%

Number of Veterans
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct375
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9073.757895
Minimum416
Maximum156961
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:18.892976image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum416
5-th percentile1669.75
Q13529.25
median4980.5
Q38392.5
95-th percentile29448.3
Maximum156961
Range156545
Interquartile range (IQR)4863.25

Descriptive statistics

Standard deviation14435.63488
Coefficient of variation (CV)1.590921319
Kurtosis40.78164889
Mean9073.757895
Median Absolute Deviation (MAD)2140
Skewness5.544698286
Sum3448028
Variance208387554.3
MonotonicityNot monotonic
2022-11-06T15:37:19.197674image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34042
 
0.5%
52042
 
0.5%
13762
 
0.5%
30272
 
0.5%
42112
 
0.5%
49731
 
0.3%
68881
 
0.3%
49571
 
0.3%
21261
 
0.3%
28551
 
0.3%
Other values (365)365
96.1%
ValueCountFrequency (%)
4161
0.3%
6291
0.3%
6931
0.3%
7051
0.3%
7241
0.3%
8971
0.3%
10661
0.3%
11011
0.3%
11311
0.3%
11901
0.3%
ValueCountFrequency (%)
1569611
0.3%
1090891
0.3%
924891
0.3%
754321
0.3%
723881
0.3%
720421
0.3%
718981
0.3%
549951
0.3%
492911
0.3%
476931
0.3%

Foreign-born
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct379
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45226.97632
Minimum1058
Maximum3212500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:19.531955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1058
5-th percentile4175.55
Q112056
median21120.5
Q334742.5
95-th percentile113255.7
Maximum3212500
Range3211442
Interquartile range (IQR)22686.5

Descriptive statistics

Standard deviation175284.3482
Coefficient of variation (CV)3.875659229
Kurtosis283.6036436
Mean45226.97632
Median Absolute Deviation (MAD)11092
Skewness15.93392508
Sum17186251
Variance3.072460271 × 1010
MonotonicityNot monotonic
2022-11-06T15:37:19.854906image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
57572
 
0.5%
120701
 
0.3%
281331
 
0.3%
296911
 
0.3%
406661
 
0.3%
596551
 
0.3%
37321
 
0.3%
66301
 
0.3%
147751
 
0.3%
97041
 
0.3%
Other values (369)369
97.1%
ValueCountFrequency (%)
10581
0.3%
10621
0.3%
12241
0.3%
15311
0.3%
18151
0.3%
18841
0.3%
21381
0.3%
22041
0.3%
22581
0.3%
28291
0.3%
ValueCountFrequency (%)
32125001
0.3%
6962101
0.3%
5734631
0.3%
4014931
0.3%
3738421
0.3%
3268251
0.3%
3007021
0.3%
2971991
0.3%
2607891
0.3%
2080461
0.3%

Average Household Size
Real number (ℝ≥0)

Distinct143
Distinct (%)37.7%
Missing1
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean2.806965699
Minimum2
Maximum4.98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB
2022-11-06T15:37:20.142032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2.24
Q12.48
median2.7
Q33.04
95-th percentile3.673
Maximum4.98
Range2.98
Interquartile range (IQR)0.56

Descriptive statistics

Standard deviation0.46817616
Coefficient of variation (CV)0.1667908376
Kurtosis2.31726247
Mean2.806965699
Median Absolute Deviation (MAD)0.27
Skewness1.286587442
Sum1063.84
Variance0.2191889168
MonotonicityNot monotonic
2022-11-06T15:37:20.463177image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.729
 
2.4%
2.738
 
2.1%
2.858
 
2.1%
2.647
 
1.8%
3.136
 
1.6%
2.526
 
1.6%
2.786
 
1.6%
2.456
 
1.6%
2.976
 
1.6%
2.596
 
1.6%
Other values (133)311
81.8%
ValueCountFrequency (%)
21
 
0.3%
2.061
 
0.3%
2.082
0.5%
2.131
 
0.3%
2.172
0.5%
2.182
0.5%
2.191
 
0.3%
2.21
 
0.3%
2.211
 
0.3%
2.223
0.8%
ValueCountFrequency (%)
4.981
0.3%
4.781
0.3%
4.581
0.3%
4.571
0.3%
4.151
0.3%
4.082
0.5%
3.971
0.3%
3.931
0.3%
3.91
0.3%
3.891
0.3%

State Code
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
CA
114 
TX
48 
FL
41 
IL
19 
MI
15 
Other values (12)
143 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters760
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIL
2nd rowVA
3rd rowCO
4th rowAZ
5th rowTX

Common Values

ValueCountFrequency (%)
CA114
30.0%
TX48
12.6%
FL41
 
10.8%
IL19
 
5.0%
MI15
 
3.9%
CO15
 
3.9%
AZ14
 
3.7%
WA14
 
3.7%
VA13
 
3.4%
MA13
 
3.4%
Other values (7)74
19.5%

Length

2022-11-06T15:37:20.797381image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca114
30.0%
tx48
12.6%
fl41
 
10.8%
il19
 
5.0%
mi15
 
3.9%
co15
 
3.9%
az14
 
3.7%
wa14
 
3.7%
ma13
 
3.4%
va13
 
3.4%
Other values (7)74
19.5%

Most occurring characters

ValueCountFrequency (%)
A168
22.1%
C141
18.6%
L60
 
7.9%
N54
 
7.1%
T48
 
6.3%
X48
 
6.3%
M48
 
6.3%
I44
 
5.8%
F41
 
5.4%
O25
 
3.3%
Other values (7)83
10.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter760
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A168
22.1%
C141
18.6%
L60
 
7.9%
N54
 
7.1%
T48
 
6.3%
X48
 
6.3%
M48
 
6.3%
I44
 
5.8%
F41
 
5.4%
O25
 
3.3%
Other values (7)83
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin760
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A168
22.1%
C141
18.6%
L60
 
7.9%
N54
 
7.1%
T48
 
6.3%
X48
 
6.3%
M48
 
6.3%
I44
 
5.8%
F41
 
5.4%
O25
 
3.3%
Other values (7)83
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII760
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A168
22.1%
C141
18.6%
L60
 
7.9%
N54
 
7.1%
T48
 
6.3%
X48
 
6.3%
M48
 
6.3%
I44
 
5.8%
F41
 
5.4%
O25
 
3.3%
Other values (7)83
10.9%

Interactions

2022-11-06T15:37:10.276892image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:48.601894image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:54.240259image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:58.423233image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:01.072077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:03.309143image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:05.351034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:07.710048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:10.524936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:49.079056image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:54.981973image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:58.804889image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:01.315837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:03.572873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:05.593311image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:07.981008image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:10.848084image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:49.748855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:55.606290image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:59.077034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:01.664164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:03.870101image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:05.823063image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:08.281653image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:11.253932image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:50.531483image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:56.244274image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:59.445873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:02.037865image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:04.113346image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:06.229736image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:08.624907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:11.636982image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:51.267314image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:56.868064image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:59.702268image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:02.289274image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:04.341012image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:06.578588image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:08.973282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:11.989980image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:51.787883image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:57.164045image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:00.061629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:02.529109image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:04.592928image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:06.876939image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:09.255578image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:12.259280image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:52.553246image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:57.581920image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:00.434942image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:02.790940image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:04.850207image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:07.154895image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:09.538988image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:12.503117image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:53.464270image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:36:58.043339image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:00.805433image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:03.045861image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:05.093119image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:07.446048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-06T15:37:09.906414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-06T15:37:21.139098image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-06T15:37:21.487995image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-06T15:37:21.842209image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-06T15:37:22.165899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-06T15:37:22.419927image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-06T15:37:13.001332image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-06T15:37:13.410976image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-06T15:37:13.825037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-06T15:37:14.157050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0CityStateMedian AgeMale PopulationFemale PopulationTotal PopulationNumber of VeteransForeign-bornAverage Household SizeState Code
01PeoriaIllinois33.156229.062432.01186616634.07517.02.40IL
13HamptonVirginia35.566214.070240.013645419638.06204.02.48VA
24LakewoodColorado37.776013.076576.01525899988.014169.02.29CO
35MesaArizona36.9234998.0236835.047183331808.057492.02.68AZ
46BryanTexas29.441761.040345.0821063602.012014.02.55TX
57GarlandTexas34.5116406.0120430.023683610407.062975.03.12TX
68SpringfieldIllinois38.855639.062170.01178097525.04264.02.22IL
79FlintMichigan35.348984.049313.0982973757.02138.02.38MI
810TacomaWashington37.7100914.0107036.020795019040.031863.02.48WA
912EaganMinnesota36.831587.034701.0662882699.08642.02.49MN

Last rows

Unnamed: 0CityStateMedian AgeMale PopulationFemale PopulationTotal PopulationNumber of VeteransForeign-bornAverage Household SizeState Code
3701581VancouverWashington37.282958.089895.017285312391.021748.02.49WA
3711604MinneapolisMinnesota32.4206547.0204388.041093515217.070769.02.26MN
3721609San BuenaventuraCalifornia37.753932.055785.01097175980.018025.02.68CA
3731645HarlingenTexas30.132404.033365.0657692761.010391.03.00TX
3741651San LeandroCalifornia41.843032.047679.0907114111.034293.02.79CA
3751657TylerTexas33.950422.053283.01037054813.08225.02.59TX
3761762MenifeeCalifornia37.142866.044297.0871636821.012481.03.06CA
3771774GainesvilleFlorida26.060803.069330.01301334788.015272.02.33FL
3781931BellflowerCalifornia33.438936.039498.0784342154.024607.03.58CA
3791963CamdenNew Jersey27.936437.039694.0761311425.011317.03.00NJ